Primary Data Sources
Our analysis leverages a comprehensive dataset constructed from multiple sources to ensure robustness and replicability:
Blockchain Data: We operate a full Bitcoin Core node (v24.0+) with complete transaction indexing enabled, providing access to all historical blocks from genesis. Raw blockchain data is parsed and stored in a PostgreSQL database with the following schema:
- Blocks table: 750,000+ blocks with timestamps, size, weight, and mining metadata
- Transactions table: 800M+ transactions with fees, version, and locktime data
- Inputs/Outputs tables: 2B+ records enabling complete UTXO tracking
- Mempool snapshots: 10-minute intervals capturing fee distribution and congestion
Price Data: Tick-level trade data from [Binance, Coinbase, Kraken] aggregated to 1-minute OHLCV bars, with volume-weighted average prices (VWAP) calculated for each 10-minute block interval to align with blockchain timestamps. We implement outlier detection to filter flash crashes and exchange-specific anomalies.
Auxiliary Data: Network hash rate from multiple mining pools, difficulty adjustments from protocol, and exchange flow data from Chainalysis for validation.